Skip to content

feat(storage): SQLite FTS5+BM25 log search#52

Merged
aksOps merged 1 commit into
mainfrom
feat/sqlite-fts5-bm25
Apr 27, 2026
Merged

feat(storage): SQLite FTS5+BM25 log search#52
aksOps merged 1 commit into
mainfrom
feat/sqlite-fts5-bm25

Conversation

@aksOps

@aksOps aksOps commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Provisions an FTS5 virtual table (logs_fts) over logs(body, service_name) on every SQLite boot via AutoMigrateModels — idempotent, with rebuild backfill for pre-existing rows.
  • Keeps the FTS index in sync via AFTER INSERT/DELETE/UPDATE triggers; retention purges propagate automatically.
  • Routes SearchLogs and GetLogsV2 through the FTS5 path on SQLite ordered by bm25() (lower = more relevant). Postgres keeps the existing pg_trgm GIN path; MySQL/SQL Server keep LIKE.
  • User queries are escaped and prefix-suffixed (*) so conn still matches connection; the porter tokenizer handles English stems (panic matches panicked); unicode61 is case- and accent-insensitive.
  • Transparent LIKE fallback on FTS5 query errors so a misbehaving index does not become a 500.

Why

Phase 3a of the 7-day-retention robustness initiative. Replaces full-table LIKE scans for log search on the default SQLite storage adaptor with a BM25-ranked inverted index. Postgres already had pg_trgm; this brings the SQLite path to parity for substring/relevance search. Phase 3b will follow with Postgres declarative partitioning as an opt-in adapter.

Test plan

  • go test ./internal/storage/ -run 'TestFTS5|TestSearchLogs|TestGetLogsV2|TestSetupSQLite|TestSQLite_LIKE|TestPurgeLogs|TestCompressedText' — pass
  • go test ./... -race -count=1 — full tree green under race detector
  • go vet ./... — clean
  • golangci-lint run --new-from-rev=origin/main — clean
  • BM25 ordering verified (triple-occurrence row ranks ahead of single-occurrence)
  • Tenant isolation verified (FTS results still scope by tenant_id)
  • Delete trigger keeps FTS in sync after PurgeLogsBatched
  • Special-char queries (AND OR NOT, ", * + -, { } ( )) escape safely
  • setupSQLiteFTS5 is idempotent on re-run
  • LIKE fallback path retained; existing TestSearchLogs_* and TestSQLite_LIKE_CaseInsensitivityForASCII still pass

🤖 Generated with Claude Code

SQLite log search now routes through an FTS5 virtual table (`logs_fts`) over
`(body, service_name)` with `bm25()` ranking. The index is kept in sync via
AFTER INSERT/DELETE/UPDATE triggers on `logs`, so retention purges and
manual deletes propagate automatically. The setup is idempotent and
backfills existing rows on first boot via FTS5's `rebuild` command.

User input is escaped and prefix-suffixed (`*`) so partial words still
match (e.g., `conn` matches `connection`); the porter tokenizer covers
inflectional matches (`panic` matches `panicked`). On any FTS5 query
error the repository transparently falls back to LIKE so a misbehaving
index never surfaces as a 500.

Postgres keeps the existing `pg_trgm` GIN path; MySQL/SQL Server keep
LIKE. New tests cover BM25 ordering, prefix and stemming matches, tenant
isolation, the delete trigger sync, special-character escaping, and the
GetLogsV2 search path. Docs updated in CLAUDE.md and OPERATIONS.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
7.8% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@aksOps aksOps merged commit 96ec26e into main Apr 27, 2026
16 of 17 checks passed
@aksOps aksOps deleted the feat/sqlite-fts5-bm25 branch April 27, 2026 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant